2019-03-18

Welcome

Welcome to today's R-Ladies MTL session. As a passionate R-Lady, I feel incredibly connected to the community we're created here. Every month, we get to sit down, code, learn and share in an open and welcoming environment.

I get a lot out of these meetups. And you all keep coming, it means you do to. But I also wonder if we can see some macro level impacts of informal networking groups like this.

In today's session, I'll be presenting some data on women in tech groups (like this one) and wether there is evidence that they've had a impact on the lives of women in tech.

Research question

Are women in tech groups associated with more women in the tech industry and greater pay equity: An ecological analysis

Method

Data sources

We collected data from several sources

  • Meetup.com: RSVP all "women in tech" group events

    + Over 1 million meetup events from all over the world between 
    2002 and 2019
    
    + Over 50k women in tech groups between 2002 to 2019
  • Publicly avaialble global figures of women in tech, salaries, wage gaps

    + 41 countries, cross sectional. 
    
    + Data collected from global reports, eurostat, OECD, UNESCO’s 
    Institute for Statistics database, World Economic Forum Report, ILO, 
    ILOSTAT database

Scrape the meetup.com data

Package: Source: https://github.com/rladies/meetupr

Sys.setenv(MEETUP_KEY = "PASTE YOUR MEETUP KEY HERE")
# Slow function so we don't hit the Meetup rate limit From here:
# https://github.com/rladies/meetupr/issues/30#issuecomment-379900167
slowly <- function(f, delay = 0.25) {
    function(...) {
        Sys.sleep(delay)
        f(...)
    }
}
# Wrap 'get_events'' to link the parent group into the result data
get_group_events <- function(group) {
    events <- get_events(group, event_status = "past")
    events <- events %>% mutate(group_url = group)
}

Classify "tech" meetups

girls biotech python
female data mining PHP
big data analytics swift
blockchain nerd ruby
machine learning geek web dev
artificial intelligence code webdev
virtual reality develop game dev
augmented reality javascript gamedev
biotech html unity
data mining java code
analytics nerd fintech
NA geek NA

Filtering tech groups

# Filter tech groups, and unique (just in case)
unique_groups = all_results %>% 
  select(id, name, urlname, created, members, status, city, state, country, who, organizer_id, organizer_name, category_id, category_name) %>%
  filter(category_name == "Tech") %>%
  distinct()

# Save results 
write.csv(unique_groups, "groups.csv")

Finding Events

unique_groups <- read.csv("groups.csv")

# Split the requests into chunks in case we disconnect part way 
chunk_size <- 100
number_chunks <- as.integer(nrow(unique_groups) / chunk_size) + 1

# Warning: This will take approximate 18 years 
for (offset in 1:number_chunks) {
  start <- 1 + (offset * chunk_size)
  end <- (offset + 1) * chunk_size
  print(offset)
    
  # Get the groups in the current chunk
  group_urls <- unique_groups %>%
    select(urlname) %>%
    slice(start:end) 
  
  # Request all the events for each group 
  event_results <- map(group_urls$urlname, slowly(safely(get_group_events)))
  
  # Guard against the end of the list 
  if (length(event_results) > 0) {
    filtered_results <- event_results %>%
      map("result") %>%
      bind_rows() %>%
      select(id, name, local_date, yes_rsvp_count, group_url)
  
    # Save our progress to disk in case we error out somewhere
    write_csv(filtered_results, path = "events.csv", append = TRUE)
  }
}

Let's explore the data

name members city state country who year
Girl Develop It Boise 549 Boise ID US Techies 2014
Girl Develop It West Palm Beach 885 West Palm Beach FL US Nerdettes 2014
Girl Develop It Atlanta 2645 Atlanta GA US Developers 2014
Girls in Tech - Los Angeles 1062 Santa Monica CA US GITLA Techies 2014
Girl Develop It NYC 14519 New York NY US Nerdettes 2010
Getting Girls to Code (Supporting Made with Code by Google) 1624 Sydney AU coders 2014
Girl Develop It Philadelphia 5544 Philadelphia PA US students 2011

Women in tech groups over time

R-Ladies over time

Women in tech worldwide

source: https://www.honeypot.io/women-in-tech-2018/

Sub-major group 25 of the International Standard Classification of Occupations (ISCO-08).

The main components of this section are publishing activities, including software publishing (division 58), motion picture and sound recording activities (division 59), radio and TV broadcasting and programming activities (division 60), telecommunications activities (division 61) and information technology activities (division 62) and other information service activities (division 63). Source: Eurostat.

The tech community: globally

Women in tech - stack overflow, globally

When making maps, sometimes you have data that has 2-character, 3-character abbreviations, full spelling. There's a handy package called rworldmaps that allows you to convert almost any geographic spelling to maps other types. See ?codelist for all options.

# Match country names in survey to the country names in package
matched = joinCountryData2Map(tech_jobs, joinCode="NAME", nameJoinColumn="country", verbose = T)
## 41 codes from your data successfully matched countries in the map
## 0 codes from your data failed to match with a country code in the map
##      failedCodes failedCountries
## 202 codes from the map weren't represented in your data

Very ugly plot

Let's try to make it a bit prettier

Meetup tech groups and women in tech

Create a data frame of the number of groups, by year and country We'll only keep 2016 since the data for tech jobs is cross-sectional Convert the 2-letter ios2c country codes to full length country names Merge the tech jobs & meetup tech groups!

groups_jobs = groups %>% 
        filter(year == 2016) %>% 
        group_by(country) %>%
        summarise(groups.total = n()) %>% 
        rename(country.abb = country) %>% 
        ungroup()

groups_jobs$country = countrycode(groups_jobs$country.abb, "iso2c", "country.name")
        
groups_jobs %<>%  left_join(tech_jobs) %>% 
        na.omit() %>% data.frame()

Create plot of jobs and tech groups

plot.groups = ggplot(data = groups_jobs, aes(x = groups.total, y = tech_perc_women)) +
        geom_point() + 
        geom_text_repel(aes(label=country.abb), size = 3) + 
        labs(
                title = "Number of Women's Tech Groups and Women in Tech Jobs, by Country ",
                caption = "Source: https://www.honeypot.io/women-in-tech-2018/",
                x = "Women's Tech Meetup gGroups",
                y = "Percentage of women in tech") +
        guides(size = FALSE) +
        theme_bw() +
        theme(panel.grid.major.x = element_blank(),
        legend.position = 'none')

Women's tech groups and jobs, by country

Future directions

I'd like to keep exploring the data and wanted to ask for your help. It might be a fun idea to have a R-Ladies Montreal group analysis on … ourselves. All the code and data will be available on github. Please fork the repo and contribute in any way you'd like. The ultimate goal would be to write a short editorial and have it published in a journal.

So let's bounce some ideas around and see what we come up with!